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Abstract 

This  paper  considers  the  support  of  real-time  applications 
in  an  Integrated  Services  Packet  Network  (ISPN).  We  hrst 
review  the  characteristics  of  real-time  applications.  We  ob¬ 
serve  that,  contrary  to  the  popular  view  that  real-time  ap¬ 
plications  necessarily  require  a  hxed  delay  bound,  some  real¬ 
time  applications  are  more  flexible  and  can  adapt  to  current 
network  conditions.  We  then  propose  an  ISPN  architec¬ 
ture  that  supports  two  distinct  kinds  of  real-time  service: 
guaranteed  service,  which  is  the  traditional  form  of  real¬ 
time  service  discussed  in  most  of  the  literature  and  involves 
pre-computed  worst-case  delay  bounds,  and  predicted  service 
which  uses  the  measured  performance  of  the  network  in  com¬ 
puting  delay  bounds.  We  then  propose  a  packet  scheduling 
mechanism  that  can  support  both  of  these  real-time  services 
as  well  as  accommodate  datagram  traffic.  We  also  discuss 
two  other  aspects  of  an  overall  ISPN  architecture:  the  ser¬ 
vice  interface  and  the  admission  control  criteria. 

1  Introduction 

The  current  generation  of  telephone  networks  and  the  cur¬ 
rent  generation  of  computer  networks  were  each  designed  to 
carry  specihc  and  very  different  kinds  of  traffic:  analog  voice 
and  digital  data.  However,  with  the  digitizing  of  telephony 
in  ISDN  and  the  increasing  use  of  multi-media  in  computer 
applications,  this  distinction  is  rapidly  disappearing.  Merg¬ 
ing  these  sorts  of  services  into  a  single  network,  which  we  re¬ 
fer  to  here  as  an  Integrated  Services  Packet  Network  (ISPN), 
would  yield  a  single  telecommunications  infrastructure  offer¬ 
ing  a  multitude  of  advantages,  including  vast  economies  of 
scale,  ubiquity  of  access,  and  improved  statistical  multiplex¬ 
ing.  There  is  a  broad  consensus,  at  least  in  the  computer 
networking  community,  that  an  ISPN  is  both  a  worthy  and 
an  achievable  goal.  However,  there  are  many  political,  ad¬ 
ministrative,  and  technical  hurdles  to  overcome  before  this 
vision  can  become  a  reality. 

^Research  at  MIT  was  supported  by  DARPA  through  NASA  Grant 
NAG  2-582,  by  NSF  grant  NGR-8814187,  and  by  DARPA  and  NSF 
through  Gooperative  Agreement  NGR-8919038  with  the  Gorporation 
for  National  Research  Initiatives. 


One  of  the  most  vexing  technical  problems  that  blocks 
the  path  towards  an  ISPN  is  that  of  supporting  real-time 
applications  in  a  packet  network.  Real-time  applications 
are  quite  different  from  standard  data  applications,  and  re¬ 
quire  service  that  cannot  be  delivered  within  the  typical  data 
service  architecture.  In  Section  2  we  discuss  the  nature  of 
real-time  applications  at  length;  here,  however,  it  suffices 
to  observe  that  one  salient  characteristic  of  the  real-time 
applications  we  consider  is  that  they  require  a  bound  on 
the  delivery  delay  of  each  packet^ .  While  this  bound  may 
be  statistical,  in  the  sense  that  some  small  fraction  of  the 
packets  may  fail  to  arrive  by  this  bound,  the  bound  itself 
must  be  known  a  priori.  The  traditional  data  service  archi¬ 
tecture  underlying  computer  networks  has  no  facilities  for 
prescheduling  resources  or  denying  service  upon  overload, 
and  thus  is  unable  to  meet  this  real-time  requirement. 

Therefore,  in  order  to  handle  real-time  traffic,  an  en¬ 
hanced  architecture  is  needed  for  an  ISPN.  We  identify  four 
key  components  to  this  architecture.  The  hrst  piece  of  the 
architecture  is  the  nature  of  the  commitments  made  by  the 
network  when  it  promises  to  deliver  a  certain  quality  of  ser¬ 
vice.  We  identify  two  sorts  of  commitments,  guaranteed  and 
predicted.  Predicted  service  is  a  major  aspect  of  our  paper. 
While  the  idea  of  predicted  service  has  been  considered  be¬ 
fore,  the  issues  that  surround  it  have  not,  to  our  knowledge, 
been  carefully  explored. 

The  second  piece  of  the  architecture  is  the  service  inter¬ 
face,  i.e.,  the  set  of  parameters  passed  between  the  source 
and  the  network.  The  service  interface  must  include  both 
the  characterization  of  the  quality  of  service  the  network  will 
deliver,  fulhlling  the  need  of  applications  to  know  when  their 
packets  will  arrive,  and  the  characterization  of  the  source’s 
traffic,  thereby  allowing  the  network  to  knowledgeably  al¬ 
locate  resources.  In  this  paper  we  attempt  to  identify  the 
critical  aspects  of  the  service  interface,  and  offer  a  particular 
interface  as  an  example.  We  address  in  passing  the  need  for 
enforcement  of  these  characterizations. 

The  third  piece  of  the  architecture  is  the  packet  schedul¬ 
ing  behavior  of  network  switches  needed  to  meet  these  ser¬ 
vice  commitments.  We  discuss  both  the  actual  scheduling 
algorithms  to  be  used  in  the  switches,  as  well  as  the  schedul¬ 
ing  information  that  must  be  carried  in  packet  headers.  This 

^  Since  the  term  bound  is  tossed  around  with  great  abandon  in  the 
rest  of  the  paper,  we  need  to  identify  several  different  meanings  to 
the  term.  An  a  prion  bound  on  delay  is  a  statement  that  none  of 
the  future  delays  will  exceed  that  amount.  A  post  facto  bound  is  the 
maximal  value  of  a  set  of  observed  delays.  Statistical  bounds  allow 
for  a  certain  percentage  of  violations  of  the  bound;  absolute  bounds 
allow  none. 
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part  of  the  architecture  must  be  carefully  considered;  since  it 
must  be  executed  for  every  packet  it  must  not  be  so  complex 
as  to  effect  overall  network  performance. 

The  hnal  part  of  the  architecture  is  the  means  by  which 
the  traffic  and  service  commitments  get  established.  Clearly, 
the  ability  of  the  network  to  meet  its  service  commitments  is 
related  to  the  criteria  the  network  uses  to  decide  whether  to 
accept  another  request  for  service.  While  we  do  not  present 
a  specihc  algorithm  to  regulate  the  admission  of  new  sources, 
we  show  the  relation  between  the  other  parts  of  our  proposal 
and  a  general  approach  to  the  admission  control  problem. 

There  are  also  many  architectural  issues  not  directly  re¬ 
lated  to  the  nature  of  real-time  traffic;  for  instance,  the  is¬ 
sues  of  routing  and  interaction  of  administrative  domains 
all  pose  interesting  challenges.  We  do  not  address  these  is¬ 
sues  in  this  paper,  and  any  hnal  architectural  proposal  for 
an  ISPN  must  solve  these  longstanding  problems.  It  is  im¬ 
portant  to  note,  however,  that  we  do  not  believe  that  the 
architectural  choices  we  advocate  here  for  real-time  traffic 
unnecessarily  restrict  the  scope  of  solutions  to  these  other 
problems. 

This  paper  has  12  Sections  and  an  Appendix.  In  Section 
2  we  begin  with  a  discussion  of  the  nature  of  real-time  traf- 
hc.  In  particular,  we  note  that  some  real-time  applications 
can  adapt  to  current  network  conditions.  This  leads  us  to 
propose,  in  Section  3,  that  the  ISPN  support  two  kinds  of 
real-time  service  commitments:  guaranteed  service  and  pre¬ 
dicted  service.  In  Section  4  we  present  a  time-stamp  based 
scheduling  algorithm  which  is  a  nonuniformly  weighted  ver¬ 
sion  of  the  Fair  Queueing  algorithm  discussed  in  Reference 
[4],  and  then  refer  to  a  recent  result  due  to  Parekh  and  Gal- 
lager  (see  References  [19,  20])  which  states  that,  under  cer¬ 
tain  conditions,  this  algorithm  delivers  guaranteed  service  in 
a  network  of  arbitrary  topology.  We  then  turn,  in  Sections 
5  and  6,  to  the  scheduling  algorithms  best  suited  for  pro¬ 
viding  predicted  service.  We  combine  these  two  scheduling 
algorithms  in  Section  7,  presenting  a  unihed  scheduling  algo¬ 
rithm  which  provides  both  guaranteed  and  predicted  service. 
The  scheduling  algorithm  incorporates  two  novel  ideas;  that 
of  using  FIFO  service  in  a  real-time  context,  and  that  of  cor¬ 
relating  the  queueing  delay  of  a  packet  at  successive  nodes 
in  its  path  to  reduce  delay  jitter.  Given  the  current  frenzy 
of  activity  in  the  design  of  real-time  scheduling  algorithms, 
we  do  not  expect  that  the  algorithm  presented  here  will  be 
the  hnal  word  on  the  matter;  however,  we  do  hope  that  the 
insight  embodied  therein  will  be  of  lasting  value.  In  partic¬ 
ular,  we  think  that  the  insight  underlying  our  design,  that  it 
is  necessary  to  distinguish  between  the  two  basic  principles 
of  isolation  and  sharing,  is  both  fundamental  and  novel. 

In  Section  8  we  return  to  the  issue  of  the  service  interface. 
Since  the  service  interface  will  be  invoked  by  applications, 
we  expect  that  a  real-time  service  interface  will  outlive  any 
particular  underlying  network  mechanism.  Thus,  we  have 
attempted  in  our  proposal  to  produce  an  interface  which  is 
hexible  enough  to  accommodate  a  wide  variety  of  supporting 
mechanisms.  Admission  control  policies  are  discussed  briefly 
in  Section  9,  and  the  support  of  other  service  qualities  is 
covered  in  Section  10. 

In  order  to  build  up  sufficient  context  to  meaningfully 
compare  our  work  to  previously  published  work,  we  de¬ 
lay  the  detailed  discussion  of  related  work  until  Section  11. 
However,  we  wish  to  note  here  that  our  work  borrows  heav¬ 
ily  from  the  rapidly  growing  literature  on  providing  real¬ 
time  service  in  packet  networks.  In  particular,  the  works  of 
Parekh  and  Gallager  ([20,  19]),  Jacobson  and  Floyd  ([14]), 


and  Lazar,  Hyman,  and  Pacihci  ([12,  13])  have  all  con¬ 
tributed  to  our  design. 

Finally,  in  Section  12,  we  conclude  our  paper  with  a  re¬ 
view  of  our  results  and  a  brief  discussion  of  related  economic 
issues.  The  Appendix  contains  details  relating  to  the  simu¬ 
lation  results  that  are  presented  in  Sections  5-7. 

2  Properties  of  Real-Time  Traffic 

2.1  A  Class  of  Real-Time  Applications 

In  the  discussion  that  follows,  we  focus  on  a  particular  class 
of  real-time  application  which  we  dub  play-feacA;  applications. 
In  a  play-back  application,  the  source  takes  some  signal, 
packetizes  it,  and  then  transmits  it  over  the  network.  The 
network  inevitably  introduces  some  variation  in  the  delay  of 
each  delivered  packet.  This  variation  has  traditionally  been 
called  jitter.  The  receiver  depacketizes  the  data  and  then 
attempts  to  faithfully  play  back  the  signal.  This  is  done 
by  buffering  the  incoming  data  to  remove  the  network  in¬ 
duced  jitter  and  then  replaying  the  signal  at  some  designated 
play-back  point.  Any  data  that  arrives  before  its  associated 
play-back  point  can  be  used  to  reconstruct  the  signal;  data 
arriving  after  the  play-back  point  is  useless  in  reconstructing 
the  real-time  signal.  For  the  purposes  of  this  paper,  we  as¬ 
sume  that  all  such  applications  have  sufficient  buffering  to 
store  all  packets  which  arrive  before  the  play-back  point;  we 
return  to  this  point  in  Section  10. 

Not  all  real-time  applications  are  play-back  applications 
(for  example,  one  might  imagine  a  visualization  application 
which  merely  displayed  the  image  encoded  in  each  packet 
whenever  it  arrived).  However,  we  believe  the  vast  majority 
of  future  real-time  applications,  including  most  video  and 
audio  applications,  will  ht  this  paradigm.  Furthermore,  non- 
play-back  applications  can  still  use  the  real-time  network 
service  provided  by  our  architecture,  although  this  service 
is  not  specihcally  tailored  to  their  needs. 

Play-back  real-time  applications  have  several  service  re¬ 
quirements  that  inform  our  design  proposal.  First,  since 
there  is  often  real-time  interaction  between  the  two  ends 
of  an  application,  as  in  a  voice  conversation,  the  application 
performance  is  sensitive  to  the  data  delivery  delay;  in  general 
lower  delay  is  much  preferable.  Second,  in  order  to  set  the 
play-back  point,  the  application  needs  to  have  some  infor¬ 
mation  (preferably  an  absolute  or  statistical  bound)  about 
the  delays  that  each  packet  will  experience.  Third,  since  all 
data  is  buffered  until  the  play-back  point,  the  application  is 
indifferent  as  to  when  data  is  delivered  as  long  as  it  arrives 
before  the  play-back  point^ .  This  turns  out  to  be  a  crucial 
point,  as  it  allows  us  to  delay  certain  packets  which  are  in  no 
danger  of  missing  their  play-back  point  in  favor  of  packets 
which  are.  Fourth,  these  play-back  applications  can  often 
tolerate  the  loss  of  a  certain  fraction  of  packets  with  only  a 
minimal  distortion  in  the  signal.  Therefore,  the  play-back 
point  need  not  be  so  delayed  that  absolutely  every  packet 
arrives  beforehand. 

2.2  The  Nature  of  Delay 

The  delay  in  the  network  derives  from  several  causes.  There 
is  in  practice  a  large  hxed  component  to  the  delay,  caused 
by  the  propagation  of  the  packet  at  the  speed  of  light,  and 

^This  is  where  we  invoke  the  assumption,  mentioned  previously, 
that  the  receiver  has  sufficient  buffers. 


the  delay  in  transmission  at  each  switch  point  waiting  for 
the  entire  packet  to  arrive  before  commencing  the  next  stage 
of  transmission.  {Cut-through  neiYioiks  avoid  this  delay  by 
starting  transmission  before  receipt  is  complete;  most  packet 
networks  are  not  cnt-throngh.)  Added  to  this  hxed  delay  is  a 
variable  amonnt  of  delay  related  to  the  time  that  each  packet 
spends  in  service  qnenes  in  the  switches.  This  variation,  or 
jitter,  is  what  mnst  be  bonnded  and  minimized  if  adeqnate 
real-time  service  is  to  be  achieved. 

Qneneing  is  a  fnndamental  conseqnence  of  the  statistical 
sharing  that  occnrs  in  packet  networks.  One  way  to  rednce 
jitter  might  be  to  eliminate  the  statistical  behavior  of  the 
sonrces.  Indeed,  one  misconception  is  that  real-time  sonrces 
cannot  be  bnrsty  (variable  in  their  transmission  rate),  bnt 
mnst  transmit  at  a  hxed  invariant  rate  to  achieve  a  real-time 
service.  We  reject  this  idea;  allowing  sonrces  to  have  bnrsty 
transmission  rates  and  to  take  advantage  of  statistical  shar¬ 
ing  is  a  major  advantage  of  packet  networks.  Onr  approach 
is  thns  to  bonnd  and  characterize  the  bnrstiness,  rather  than 
eliminate  it. 

The  idea  of  statistical  sharing  implies  that  there  are  in¬ 
deed  several  sonrces  nsing  the  bandwidth;  one  cannot  share 
alone.  Onr  approach  to  real-time  traffic  thns  looks  at  the 
aggregation  of  traffic  as  fnndamental;  the  network  mnst  be 
shared  in  snch  a  way  that  clients  (1)  get  better  service  than 
if  there  were  no  sharing  (as  in  a  circnit  switched  or  TDM 
network)  and  (2)  are  protected  from  the  potentially  negative 
effects  of  sharing  (most  obvionsly  the  disrnption  of  service 
cansed  by  sharing  with  a  mis-behaving  sonrce  that  overloads 
the  resonrce). 

2.3  Dealing  with  Delay 

In  order  for  an  application  to  predict  its  level  of  performance 
with  a  given  qnality  of  network  service,  it  needs  to  deter¬ 
mine,  to  achieve  satisfactory  performance,  what  fraction  of 
its  packets  mnst  arrive  before  the  play-back  point,  and  it 
needs  to  know  where  to  set  its  playback  point.  Thns,  some 
bonnd  on  the  delay,  pins  an  estimate  of  the  fraction  of  pack¬ 
ets  missing  that  bonnd,  forms  the  nnclens  of  the  network’s 
service  specihcation  in  the  service  interface  (to  be  discnssed 
more  fnlly  in  Section  8). 

Some  real-time  applications  will  nse  an  a  priori  delay 
bonnd  advertised  by  the  network  to  set  the  play-back  point 
and  will  keep  the  play-back  point  hxed  regardless  of  the 
actnal  delays  experienced.  These  we  dnb  rigid  applications. 
For  other  applications,  the  receiver  will  measnre  the  network 
delay  experienced  by  arriving  packets  and  then  adaptively 
move  the  playback  point  to  the  minimal  delay  that  still  pro- 
dnces  a  snfRciently  low  loss  rate.  We  call  snch  applications 
adaptive.  Notice  that  adaptive  applications  will  typically 
have  an  earlier  play-back  point  than  rigid  applications,  and 
thns  will  snffer  less  performance  degradation  dne  to  delay. 
This  is  becanse  the  client’s  estimate  of  the  de  facto  bonnd 
on  actnal  delay  will  likely  be  less  than  the  a  priori  bonnd 
pre-compnted  by  the  network.  On  the  other  hand,  since 
the  adaptation  process  is  not  perfect  and  may  occasionally 
set  the  play-back  point  too  early,  adaptive  applications  will 
likely  experience  some  amonnt  of  losses. 

The  idea  of  adaptive  applications  is  not  relevant  to  cir¬ 
cnit  switched  networks,  which  do  not  have  jitter  dne  to 
qneneing.  Thns  most  real-time  devices  today,  like  voice 
and  video  codecs,  are  not  adaptive.  Lack  of  widespread 
experience  may  raise  the  concern  that  adaptive  applications 
will  be  difRcnlt  to  bnild.  However,  early  experiments  sng- 


gest  that  it  is  actnally  rather  easy.  Video  can  be  made 
to  adapt  by  dropping  or  replaying  a  frame  as  necessary, 
and  voice  can  adapt  imperceptibly  by  adjnsting  silent  peri¬ 
ods.  In  fact,  snch  adaptative  approaches  have  been  applied 
to  implement  packetized  voice  applications  since  early  70 ’s 
(cite Weinstein);  the  VT  ([2])  and  VAT  ([15])  packet  voice 
protocols,  which  are  cnrrently  nsed  to  transmit  voice  on  the 
Internet,  are  living  examples  of  snch  adaptive  applications^  . 
It  is  important  to  note  that  while  adaptive  applications  can 
adjnst  to  the  delivered  delays  over  some  range,  there  are 
typically  limits  to  this  adaptability;  for  instance,  once  the 
delay  reaches  a  certain  level,  it  wonld  become  difRcnlt  to 
carry  ont  interactive  conversations. 

Another  nsefnl  distinction  between  network  clients  is  how 
tolerant  they  are  to  brief  interrnptions  in  service.  This  level 
of  tolerance  is  not  jnst  a  fnnction  of  the  application,  bnt 
also  of  the  end  nsers  involved.  For  instance,  a  video  confer¬ 
ence  allowing  one  snrgeon  to  remotely  assist  another  dnring 
an  operation  will  not  be  tolerant  of  any  interrnption  of  ser¬ 
vice,  whereas  a  video  conference-based  family  rennion  might 
happily  tolerate  interrnptions  in  service  (as  long  as  it  was 
reflected  in  a  cheaper  service  rate). 

We  can  thns  characterize  network  clients  along  two  axes: 
adaptive  or  rigid,  and  tolerant  or  intolerant.  It  is  nnlikely 
that  an  intolerant  network  client  is  adaptive,  since  the  adap¬ 
tive  process  will  likely  lead,  in  the  event  of  rapidly  changing 
network  conditions,  to  a  brief  interrnption  in  service  while 
the  play-back  point  is  re-adjnsting.  Fnrthermore,  a  tolerant 
client  that  is  rigid  is  merely  losing  the  chance  to  improve  its 
delay.  Snch  a  combination  of  tolerance  and  rigidity  wonld 
probably  reflect  the  lack  of  adaptive  hardware  and  software, 
which  we  believe  will  soon  be  cheap  and  standard  enongh  to 
become  fairly  nbiqnitons.  We  are  thns  led  to  the  prediction 
that  there  will  be  two  dominant  classes  of  traffic  in  the  net¬ 
work:  intolerant  and  rigid  clients,  and  tolerant  and  adaptive 
clients.  We  predict  that  these  two  classes  will  likely  reqnest 
very  different  service  commitments  from  the  network.  Thns, 
these  basic  considerations  abont  delay  and  how  clients  deal 
with  it  have  prodnced  a  taxonomy  of  network  clients  that 
gnides  the  goals  of  onr  architectnre. 

Before  tnrning  to  the  issne  of  service  commitments,  let 
ns  note  that  one  of  the  key  differences  between  real-time 
applications  and  the  traditional  datagram  applications  lies 
in  the  natnre  of  the  offered  traffic.  Data  traffic  is  typically 
sporadic  and  nnpredictable.  In  contrast,  real-time  appli¬ 
cations  often  have  some  intrinsic  packet  generation  process 
which  is  long  lasting  compared  to  the  end-to-end  delays  of 
the  individnal  packets.  This  process  is  a  conseqnence  of  the 
specifics  of  the  application;  for  example  the  coding  algorithm 
for  video,  along  with  the  natnre  of  the  image,  will  determine 
the  packet  generation  process.  Fnrthermore,  the  character¬ 
ization  of  this  generation  process  can  often  be  closely  rep¬ 
resented  by  some  traffic  filter  (snch  as  a  token  bncket  to 
be  described  later),  and/or  be  derived  from  measnrement. 
When  a  network  has  some  knowledge  of  the  traffic  load  it 
will  have  to  carry,  it  can  allocate  its  resonrces  in  a  mnch 
more  efficient  manner. 

3  Service  Commitments 

Clearly,  for  a  network  to  make  a  service  commitment  to  a 
particnlar  client,  it  mnst  know  beforehand  some  characteri- 

^Yet  another  example  of  an  adaptive  packet  voice  application  is 
described  in  Reference  [5]. 


zation  of  the  traffic  that  will  be  offered  by  that  client.  For 
the  network  to  rehably  meet  its  service  commitment,  the 
client  mnst  meet  its  traffic  commitment  (i.e.,  its  traffic  mnst 
conform  to  the  characterization  it  has  passed  to  the  net¬ 
work).  Thns,  the  service  commitment  made  to  a  particnlar 
client  is  predicated  on  the  traffic  commitment  of  that  client. 
The  qnestion  is,  what  else  is  the  service  commitment  predi¬ 
cated  on  (besides  the  obvions  reqnirement  that  the  network 
hardware  fnnction  properly)? 

One  kind  of  service  commitment,  which  we  will  call  guar¬ 
anteed  service,  depends  on  no  other  assnmptions.  That  is, 
if  the  network  hardware  is  fnnctioning  and  the  client  is  con¬ 
forming  to  its  traffic  characterization,  then  the  service  com¬ 
mitment  will  be  met.  Notice  that  this  level  of  commitment 
does  not  reqnire  that  any  other  network  clients  conform  to 
their  traffic  commitments.  Gnaranteed  service  is  appropri¬ 
ate  for  intolerant  and  rigid  clients,  since  they  need  absolnte 
assnrances  abont  the  service  they  receive. 

However,  gnaranteed  service  is  not  necessarily  appropri¬ 
ate  for  tolerant  and  adaptive  clients.  Adaptive  clients,  by 
adjnsting  their  play-back  point  to  reflect  the  delays  their 
packets  are  cnrrently  receiving,  are  gambling  that  the  net¬ 
work  service  in  the  near  fntnre  will  be  similar  to  that  deliv¬ 
ered  in  the  recent  past.  Any  violation  of  that  assnmption  in 
the  direction  of  increased  delays  will  resnlt  in  a  brief  degra¬ 
dation  in  the  application’s  performance  as  packets  begin 
missing  the  play-back  point.  The  client  will  then  readjnst 
the  play-back  point  npward  to  reflect  the  change  in  service, 
bnt  there  will  necessarily  be  some  momentary  disrnption 
in  service.  This  will  occnr  even  if  the  network  is  meeting 
its  nominal  service  commitments  (based  on  the  bonnds  on 
the  service),  becanse  an  adaptive  application  is  typically  ig¬ 
noring  those  a  priori  bonnds  on  delay  and  adapting  to  the 
cnrrent  delivered  service. 

Thns,  as  long  as  the  application  is  gambling  that  the  re¬ 
cent  past  is  a  gnide  to  the  near  fntnre,  one  might  as  well 
define  a  class  of  service  commitment  that  makes  the  same 
gamble.  Onr  second  kind  of  service  commitment  is  called 
predicted  service.  This  level  of  commitment  has  two  com¬ 
ponents.  First,  as  stated  above,  the  network  commits  that 
if  the  past  is  a  gnide  to  the  fntnre,  then  the  network  will 
meet  its  service  characterization.  This  component  embod¬ 
ies  the  fact  that  the  network  can  take  into  acconnt  recent 
measnrement  on  the  traffic  load  in  gnessing  what  kind  of 
service  it  can  deliver  reliably.  This  is  in  marked  contrast 
to  the  worst-case  analysis  that  nnderlies  the  gnaranteed  ser¬ 
vice  commitment.  Second,  the  network  attempts  to  deliver 
service  that  will  allow  the  adaptive  algorithms  to  minimize 
their  play-back  points.  (This  is  the  same  as  saying  that  the 
service  will  attempt  to  minimize  the  post  facto  delsLj  bonnd.) 
Obvionsly,  when  the  overall  network  conditions  change,  the 
qnality  of  service  mnst  also  change;  the  intent  of  the  second 
component  of  the  commitment  is  that  when  network  con¬ 
ditions  are  relatively  static,  the  network  schednles  packets 
so  that  the  cnrrent  post  facto  delay  bonnds  (which  are  typi¬ 
cally  well  nnder  the  long-term  a  priori  bonnds  that  are  part 
of  the  service  commitment)  are  small. 

Notice  that  predicted  service  has  bnilt  into  it  very  strong 
implicit  assnmptions  abont  the  behavior  of  other  network 
clients  by  assnming  that  the  network  conditions  will  remain 
relatively  nnchanged,  bnt  involves  very  few  explicit  assnmp¬ 
tions  abont  these  other  network  clients;  i.e.,  their  cnrrent 
behavior  need  not  be  explicitly  characterized  in  any  precise 
manner.  Thns,  for  predicted  service,  the  network  takes  steps 
to  deliver  consistent  performance  to  the  client;  it  avoids  the 


hard  problem,  which  mnst  be  faced  with  gnaranteed  service, 
of  trying  to  compnte  a  priori  what  that  level  of  delivered 
service  will  be. 

We  have  thns  defined  two  sorts  of  real  time  traffic,  which 
differ  in  terms  of  the  service  commitment  they  receive.  There 
is  a  third  class  of  traffic  that  we  call  datagram  traffic,  to 
which  the  network  makes  no  service  commitments  at  all, 
except  to  promise  not  to  delay  or  drop  packets  nnnecessar- 
ily  (this  is  sometimes  called  best  effort  service). 

We  now  have  the  first  component  of  onr  architectnre, 
the  natnre  of  the  service  commitment.  The  challenge,  now, 
is  to  schednle  the  packet  departnres  at  each  switch  so  that 
these  commitments  are  kept.  For  the  sake  of  clarity,  we  first 
consider,  in  Section  4,  how  to  schednle  gnaranteed  traffic 
in  a  network  carrying  only  gnaranteed  traffic.  In  Sections 
5  and  6  we  then  consider  how  to  schednle  predicted  traffic 
in  a  network  carrying  only  predicted  traffic.  After  we  have 
assembled  the  necessary  components  of  onr  schednling  algo¬ 
rithm  we  then,  in  Section  7,  present  onr  nnified  schednling 
algorithm  which  simnltaneonsly  handles  all  three  levels  of 
service  commitment. 

As  we  present  these  schednling  schemes,  we  also  lay  the 
gronndwork  for  the  other  key  pieces  of  the  architectnre,  the 
specifics  of  the  service  interface  (which  mnst  relate  closely 
to  the  details  of  the  service  commitment)  and  the  method 
to  control  the  admission  of  new  sonrces. 

4  Scheduling  Algorithms  for  Guaranteed  Traffic 

In  this  section  we  first  describe  a  traffic  filter  and  then  a 
schednling  algorithm  that  together  provide  gnaranteed  ser¬ 
vice. 

As  discnssed  briefly  in  Section  3,  a  network  client  mnst 
characterize  its  traffic  load  to  the  network,  so  that  the  net¬ 
work  can  commit  bandwidth  and  manage  qnenes  in  a  way 
that  realizes  the  service  commitment.  We  nse  a  particnlar 
form  of  traffic  characterization  called  a  token  bucket  filter. 
A  token  bncket  filter  is  characterized  by  two  parameters,  a 
rate  r  and  a  depth  b.  One  can  think  of  the  token  bncket 
as  filling  np  with  tokens  continnonsly  at  a  rate  r,  with  b 
being  its  maximal  depth.  Every  time  a  packet  is  generated 
it  removes  p  tokens  from  the  bncket,  where  p  is  the  size 
of  the  packet.  A  traffic  sonrce  conforms  to  a  token  bncket 
filter  (r,  b)  if  there  are  always  enongh  tokens  in  the  bncket 
whenever  a  packet  is  generated. 

More  precisely,  consider  a  packet  generation  process  with 
N  and  pt  denoting  the  generation  time  and  size,  respectively, 
of  the  Fth  packet.  We  say  that  this  traffic  sonrce  conforms 
to  a  token  bncket  filter  (r,  b)  of  rate  r  and  depth  b  if  the 
seqnence  Wj  defined  by  wo  =  and  Wj  =  MIN[b,nt-i  -\- 
(N  —  N_i)r  —  pt]  obeys  the  constraint  that  Wj  >  0  for  all 
i.  The  qnantities  Wj,  if  nonnegative,  represent  the  nnmber 
of  tokens  residing  in  the  bncket  after  the  Fth  packet  leaves. 
For  a  given  traffic  generation  process,  we  can  define  the  non¬ 
increasing  fnnction  b{r)  as  the  minimal  valne  snch  that  the 
process  conforms  to  a  (r,  b{r))  filter. 

In  recent  years,  several  time-stamp  based  algorithms  have 
been  developed.  These  algorithms  take  as  inpnt  some  preas¬ 
signed  apportionment  of  the  link  expressed  as  a  set  of  rates 

(where  a  labels  the  flows);  the  resnlting  delays  depend 
on  the  bncket  sizes  6“(r“). 

One  of  the  first  snch  time-stamp  algorithms  was  the  Fair 
Queueing  algorithm  introdnced  in  Reference  [4].  This  al¬ 
gorithm  was  targeted  at  the  traditional  data  service  archi- 


tecture,  and  so  involved  no  preallocation  of  resonrces  (and 
thns  had  each  r'^  =  fi  where  /t  denotes  the  link  speed). 
In  addition,  a  weighted  version  of  the  Fair  Qneneing  algo¬ 
rithm  (which  we  refer  to  as  WFQ),  in  which  the  need 
not  all  be  eqnal,  was  also  briefly  described  in  Reference  [4]®. 
The  VirtualClock algorithm,  described  in  References  [25,  26], 
involves  an  extremely  similar  nnderlying  packet  schednling 
algorithm,  bnt  was  expressly  designed  for  a  context  where 
resonrces  were  preapportioned  and  thns  had  as  a  fnndamen- 
tal  part  of  its  architectnre  the  assnmption  that  the  shares 
were  arbitrary.  Parekh  and  Gallager,  in  Reference  [19],  rein- 
trodnce  the  WFQ  algorithm  nnder  the  name  of  packetized 
generalized  processor  sharing  (h’Gh’S).  They  have  proven  an 
important  resnlt  that  this  algorithm,  nnder  certain  condi¬ 
tions,  can  deliver  a  gnaranteed  qnality  of  service  ([20]).  We 
present  a  brief  snmmary  of  the  WFQ  algorithm  below,  since 
we  make  nse  of  it  in  onr  overall  schednling  algorithm;  see 
References  [4,  20]  for  more  details. 

First,  consider  some  set  of  flows  and  a  set  of  clock  rates 
r°‘ .  The  clock  rate  of  a  flow  represents  the  relative  share  of 
the  link  bandwidth  this  flow  is  entitled  to;  more  properly,  it 
represents  the  proportion  of  the  total  link  bandwidth  which 
this  flow  will  receive  when  it  is  active.  By  assigning  it  a 
clock  rate  r°‘  the  network  commits  to  provide  to  this  flow 
an  effective  thronghpnt  rate  no  worse  than  (pr°‘ ) / r^) 
where  the  snm  in  the  denominator  is  over  all  cnrrently  active 
flows. 

This  formnlation  can  be  made  precise  in  the  context  of  a 
flnid  flow  model  of  the  network,  where  the  bits  drain  contin- 
nonsly  ont  of  the  qnene.  Let  tf  and  pf  denote  the  generation 
time  and  size,  respectively,  of  the  Fth  packet  arriving  in  the 
a’th  flow.  We  dehne  the  set  of  fnnctions  which  char¬ 

acterize  at  any  time  the  backlog  of  bits  which  each  sonrce 
has  to  send,  and  set  m“(0)  =  0.  We  say  that  a  flow  is  active 
at  time  t  if  m°‘{t)  >  0;  let  A{t)  denote  the  set  of  active  flows. 
Then  the  dynamics  of  the  system  are  determined  as  follows. 
Whenever  a  packet  arrives,  m  mnst  discontinnonsly  increase 
by  the  packet  size:  m“(T’')  =  -|- p,  if  t  =  tf ,  where 

m°'{t'^)  and  m°'{t~)  refer  to  right  hand  and  left  hand  limits 
of  at  t.  At  all  other  times,  we  know  that  the  bits  are 
draining  ont  of  the  qnenes  of  the  active  flows  in  proportion 
to  the  clock  rates  of  the  respective  flows: 
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This  completely  characterizes  the  dynamics  of  the  flnid 
flow  model.  Parekh  and  Gallager  have  shown  the  remarkable 
resnlt  that,  in  a  network  with  arbitrary  topology,  if  a  flow 
gets  the  same  clock  rate  at  every  switch  and  the  snm  of 
the  clock  rates  of  all  the  flows  at  every  switch  is  no  greater 
than  the  link  speed,  then  the  qneneing  delay  of  that  flow  is 
bonnded  above  by  6“(r“)/r“.  Intnitively,  this  bonnd  is  the 
delay  that  wonld  resnlt  from  an  instantaneons  packet  bnrst 
of  the  token  bncket  size  being  serviced  by  a  single  link  of 
rate  r°‘ ;  the  qneneing  delays  are  no  worse  than  if  the  entire 
network  were  replaced  by  a  single  link  with  a  speed  eqnal 
to  the  flow’s  clock  rate  r“.  This  resnlt  can  be  motivated  by 
noting  that  if  the  sonrce  traffic  were  pnt  throngh  a  leaky 
bncket  hlter  of  rate  r  at  the  edge  of  the  network® ,  then  the 

®The  weighted  version  of  Fair  Queueing  is  mentioned  on  page  24 
of  Reference  [4],  though  not  referred  to  by  the  name  Weighted  Fair 
Queueing. 

®In  a  fluid  flow  version  of  a  leaky  bucket  of  rate  r,  the  bits  drain 
out  at  a  constant  rate  r  and  any  excess  is  queued. 


flow  wonld  not  snffer  any  fnrther  qneneing  delays  within  the 
network  since  the  instantaneons  service  rate  given  to  this 
flow  at  every  switch  along  the  path  wonld  be  at  least  r. 
Thns,  all  of  the  qneneing  delay  wonld  occnr  in  the  leaky 
bncket  hlter  and,  if  the  how  obeyed  an  (r,  V)  token  bncket 
hlter,  then  the  delay  in  the  leaky  bncket  hlter  wonld  be 
bonnded  by  hfr.  Notice  that  the  delay  bonnd  of  a  particnlar 
how  is  independent  of  the  other  hows’  characteristics;  they 
can  be  arbitrarily  badly  behaved  and  the  bonnd  still  applies. 
Fnrthermore,  these  bonnds  are  strict,  in  that  they  can  be 
realized  with  a  set  of  greedy  sonrces  which  keep  their  token 
bnckets  empty. 

The  previons  paragraphs  describe  WFQ  in  the  hnid  how 
approximation.  One  can  dehne  the  packetized  version  of  the 
algorithm  in  a  straightforward  manner.  Dehne  8f{t)  for  all 
t  >  tf  as  the  nnmber  of  bits  that  have  been  serviced  from 
the  how  a  between  the  times  tf  and  t.  Associate  with  each 
packet  the  fnnction  Ef(t)  =  (m°‘(tf  )  —  5“(t))/r“  where  we 
take  the  right-hand  limit  of  m;  this  nnmber  is  the  level  of 
backlog  ahead  of  the  packet  i  in  the  how  a’s  qnene  divided 
by  the  how’s  share  of  the  link,  and  can  be  thonght  of  as  an 
expected  delay  nntil  departnre  for  the  last  bit  in  the  packet. 
The  packetized  version  of  WFQ  is  merely,  at  any  time  t 
when  the  next  packet  to  be  transmitted  mnst  be  chosen,  to 
select  the  packet  with  the  minimal  Ef{t).  This  algorithm 
is  called  a  time-stamp  based  scheme  becanse  there  is  an  al¬ 
ternative  bnt  eqnivalent  formnlation  in  which  each  packet  is 
stamped  with  a  time-stamp  as  it  arrives  and  then  packets 
are  transmitted  in  increasing  order  of  time-stamps;  see  Ref¬ 
erences  [4,  20]  for  details  on  this  formnlation.  Parekh  and 
Gallager  have  shown  that  a  bonnd,  similar  to  the  flnid  flow 
bonnd,  applies  to  this  packetized  algorithm  as  well.  How¬ 
ever,  the  formnlae  for  the  delays  in  the  packetized  case  are 
signihcantly  more  complicated;  see  Reference  [20]  for  details. 

To  nnderstand  the  relation  between  the  clock  rate  r,  the 
bncket  size  b{r)  and  the  resnltant  delay,  consider  what  hap¬ 
pens  to  a  bnrst  of  packets.  The  packet  that  receives  the 
highest  qneneing  delay  is  the  last  packet  of  a  bnrst.  The 
bonnd  on  the  jitter  is  proportional  to  the  size  of  the  bnrst 
and  inversely  proportional  to  the  clock  rate.  The  means  by 
which  the  sonrce  can  improve  the  worst  case  bonnd  is  to 
increase  its  r  parameter  to  permit  the  bnrst  to  pass  throngh 
the  network  more  qnickly. 

Since  the  bonnds  given  in  gnaranteed  service  mnst  be 
worst-case  bonnds  (i.e.  the  bonnds  mnst  apply  for  all  possi¬ 
ble  behaviors  of  the  other  sonrces),  the  primary  fnnction  of 
a  schednling  algorithm  designed  to  deliver  gnaranteed  ser¬ 
vice  is  to  isolate  flows  from  each  other,  so  that  a  flow  can 
have  only  a  limited  negative  effect  on  other  flows.  The  WFQ 
scheme  isolates  each  sonrce  from  the  others  by  providing  it  a 
specihed  share  of  the  bandwidth  nnder  overload  conditions. 
The  work  of  Parekh  and  Gallager  provides  a  way  for  the 
sonrce  to  compnte  the  maximnm  qneneing  delay  which  its 
packets  will  enconnter,  provided  that  the  sonrce  restricts  it¬ 
self  to  a  (r,  V)  token  bncket  hlter.  Bnt  the  network’s  schednl¬ 
ing  algorithm  does  not  depend  on  this  hlter.  Indeed,  an  im¬ 
portant  point  abont  this  form  of  gnaranteed  service  is  that 
the  trafhc  hlters  do  not  play  any  role  in  packet  schednling. 

Given  that  there  exists  an  algorithm  that  can  deliver 
gnaranteed  service,  why  not  deliver  gnaranteed  service  to 
all  clients?  Most  bnrsty  statistical  generation  processes  are 
snch  that  any  r  which  prodnces  a  reasonably  small  ratio 
b{r)/r  (so  that  the  resnlting  delay  bonnd  is  reasonable)  is 
mnch  greater  than  the  average  data  rate  of  that  sonrce. 
Thns,  if  gnaranteed  service  were  the  only  form  of  real-time 


service  available,  then  the  overall  level  of  network  ntiliza- 
tion  dne  to  real-time  traffic  wonld  be  well  nnder  capacity, 
perhaps  50%  or  less. 

One  design  alternative  wonld  be  to  assnme  that  data¬ 
gram  traffic  wonld  comprise  the  rest  of  the  traffic,  hlling 
np  the  nnnsed  capacity.  While  we  certainly  do  not  have  a 
clear  pictnre  of  the  natnre  of  the  offered  load  in  fntnre  ISPN 
networks,  we  think  that  basing  the  case  for  a  schednling  al¬ 
gorithm  on  the  expectation  that  the  volnme  of  datagram 
traffic  will  hll  half  the  capacity  of  the  network  is,  at  best,  a 
gamble.  Conseqnently,  we  propose  to  also  offer  another  class 
of  real-time  service,  predicted  service.  Note  that  in  offering 
this  service  we  are  attempting  to  increase  the  ntilization  of 
the  network  while  still  meeting  the  service  needs  of  real-time 
clients. 

5  Scheduling  Algorithms  for  Predicted  Service 

We  motivate  the  development  of  onr  schednling  algorithm 
by  considering  the  following  gedanken  experiment.  Consider 
a  single-link  network  carrying  some  nnmber  of  clients,  and 
assnme  that  all  sonrces  conform  to  some  traffic  hlter  snch 
as  the  token  bnckets  described  above.  Fnrthermore,  assnme 
that  all  the  clients  are  bnrsty  sonrces,  and  wish  to  mix  their 
traffic  so  that  in  the  aggregate  they  achieve  a  better  nse 
of  bandwidth  and  a  controlled  delay.  How  does  one  best 
schednle  the  packets  to  achieve  low  post  facto  delay  bonnds 
(or,  eqnivalently,  minimal  play-back  points)? 

What  behavior  does  the  WFQ  algorithm  indnce?  When¬ 
ever  there  is  a  backlog  in  the  qnene,  packets  leave  the  qnene 
at  rates  proportional  to  their  clock  rates.  Consider  a  mo¬ 
ment  when  all  sonrces  are  transmitting  nniformly  at  their 
clock  rates  except  for  one  which  emits  a  bnrst  of  packets. 
The  WFQ  algorithm  wonld  continne  to  send  the  packets 
from  the  nniform  sonrces  at  their  clock  rates,  so  their  pack¬ 
ets  are  not  qnened  for  any  signihcant  time  whereas  the  back¬ 
log  of  packets  from  the  bnrsty  sonrce  will  take  a  long  time 
to  drain.  Thns,  a  bnrst  by  one  sonrce  canses  a  sharp  in¬ 
crease  in  the  delay  seen  by  that  sonrce,  and  has  minimal 
effects  on  the  delays  seen  by  the  other  sonrces.  The  median 
delay  will  be  rather  low,  assnming  the  network  link  is  not 
over-committed,  bnt  a  bnrst  will  indnce  jitter  directly,  and 
mostly,  affecting  only  the  sonrce  that  emitted  the  bnrst. 

WFQ  provides  for  a  great  degree  of  isolation,  so  that 
sonrces  are  protected  from  other  sonrces’  bnrsts.  Is  this  the 
best  approach  to  obtaining  the  lowest  playback  point  when 
a  nnmber  of  sonrces  are  sharing  a  link?  We  argne  that  this 
isolation,  while  necessary  for  providing  gnaranteed  service, 
is  connterprodnctive  for  predicted  service. 

The  natnre  of  play-back  real-time  applications  allows  the 
schednling  algorithm  to  delay  all  packets  np  to  the  play-back 
point  withont  adversely  affecting  the  application’s  perfor¬ 
mance.  Thns,  one  can  think  of  the  play-back  point  as  a 
deadline.  For  snch  problems,  the  standard  earliest-deadline- 
hrst  schednling  algorithm,  as  described  in  Reference  [17],  has 
been  proven  optimal.  However,  in  onr  gedanken  experiment 
the  play-back  points  are  not  set  a  priori,  as  in  the  above 
reference,  bnt  are  rather  the  resnlt  of  the  clients  adapting 
to  the  cnrrent  level  of  delay. 

Let  ns  consider  a  simple  example  where  a  class  of  clients 
have  similar  service  desires.  This  implies  that  they  are  all 
satished  with  the  same  delay  jitter;  thns  they  all  have  the 
same  play-back  point  and  thns  the  same  deadline.  If  the 
deadline  for  each  packet  is  a  constant  offset  to  the  arrival 


schednling 

mean 

99.9  %ile 

WFQ 

3.16 

53.86 

FIFO 

3.17 

34.72 

Table  1:  The  mean  and  99.9’th  percentile  qneneing  delays 
(measnred  in  the  nnit  of  per  packet  transmission  time)  for 
a  sample  flow  nnder  the  WFQ  and  FIFO  schednling  algo¬ 
rithms.  The  link  is  83.5%  ntilized. 


time,  the  deadline  schednling  algorithm  becomes,  snrpris- 
ingly,  FIFO;  the  packet  that  is  closest  to  its  deadline  is  the 
one  that  arrived  hrst.  Hyman,  Lazar,  and  Pacihci,  in  Ref¬ 
erence  [13],  also  make  this  observation  that  FIFO  is  merely 
a  special  case  of  deadline  schednling. 

Consider  what  happens  when  we  nse  the  FIFO  qnene¬ 
ing  discipline  instead  of  WFQ.  Now  when  a  bnrst  from 
one  sonrce  arrives,  this  bnrst  passes  throngh  the  qnene  in 
a  clnmp  while  snbseqnent  packets  from  the  other  sonrces 
are  temporarily  delayed;  this  latter  delay,  however,  is  mnch 
smaller  than  the  delay  that  the  bnrsting  sonrce  wonld  have 
received  nnder  WFQ.  Thns,  the  play-back  point  need  not 
be  moved  ont  as  far  to  accommodate  the  jitter  indnced 
by  the  bnrst.  Fnrthermore,  the  particnlar  sonrce  prodnc- 
ing  the  bnrst  is  not  singled  ont  for  increased  jitter;  all  the 
sonrces  share  in  all  the  jitter  indnced  by  the  bnrsts  of  all  the 
sonrces.  Recall  that  when  the  packets  are  of  nniform  size, 
the  total  qneneing  delay  in  any  time  period  (snmmed  over 
all  flows)  is  independent  of  the  schednling  algorithm.  The 
FIFO  algorithm  splits  this  delay  evenly,  whereas  the  WFQ 
algorithm  assigns  the  delay  to  the  flows  that  cansed  the  mo¬ 
mentary  qneneing  (by  sending  bnrsts).  When  the  delays  are 
shared  as  in  FIFO,  in  what  might  be  called  a  mnltiplexing 
of  bnrsts,  the  post /acto  jitter  bonnds  are  smaller  than  when 
the  sonrces  are  isolated  from  each  other  as  in  WFQ.  This 
was  exactly  onr  goal;  nnder  the  same  link  ntilization,  FIFO 
allows  a  nnmber  of  sonrces  aggregating  their  traffic  to  obtain 
a  lower  overall  delay  jitter. 

In  order  to  test  onr  intnition,  we  have  simnlated  both 
WFQ  and  FIFO  algorithms.  The  Appendix  contains  a  com¬ 
plete  description  of  onr  simnlation  procednre;  we  only  present 
the  resnlts  here.  We  consider  a  single  link  being  ntilized  by 
10  flows,  each  having  the  same  statistical  generation  process. 
In  Table  1  we  show  the  mean  and  99.9’th  percentile  qnene¬ 
ing  delays  for  a  sample  flow  (the  data  from  the  varions  flows 
are  similar)  nnder  each  of  the  two  schednling  algorithms. 
Note  that  while  the  mean  delays  are  abont  the  same  for 
the  two  algorithms,  the  99.9’th  percentile  delays  are  signih- 
cantly  smaller  nnder  the  FIFO  algorithm.  This  conhrms  onr 
analysis  above. 

The  FIFO  qnene  discipline  has  generally  been  consid¬ 
ered  ineffective  for  providing  real-time  service;  in  fact,  it 
has  been  shown  in  certain  circnmstances  to  be  the  worst 
possible  algorithm  ([!]).  The  reason  is  that  if  one  sonrce  in¬ 
jects  excessive  traffic  into  the  net,  this  disrnpts  the  service 
for  everyone.  This  assessment,  however,  arises  from  a  failnre 
to  distingnish  the  two  separate  objectives  of  any  traffic  con¬ 
trol  algorithm,  isolation  and  sharing.  Isolation  is  the  more 
fnndamental  goal;  it  provides  gnaranteed  service  for  well- 
behaved  clients  and  qnarantines  misbehaving  sonrces.  Bnt 
sharing,  if  it  is  performed  in  the  context  of  an  encompassing 
isolation  scheme,  performs  the  very  different  goal  of  mixing 
traffic  from  different  sonrces  in  a  way  that  is  benehcial  to 


all;  bursts  are  multiplexed  so  that  the  post  facto  jitter  is 
smaller  for  everyone.  The  FIFO  scheme  is  an  effective  shar¬ 
ing  scheme,  but  it  does  not  provide  any  isolation.  WFQ,  on 
the  other  hand,  is  an  effective  method  for  isolation.  If  we 
organize  the  traffic  into  classes  of  clients  with  similar  service 
requirements,  we  hnd  that  this  reasoning  leads  to  a  nested 
scheme  in  which  the  queuing  decision  is  in  two  steps:  a  hrst 
step  to  insure  isolation  of  classes,  and  then  a  particular  shar¬ 
ing  method  within  each  class. 

FIFO  is  not  the  only  interesting  sharing  method.  An¬ 
other  sharing  method  is  priority,  which  has  a  very  different 
behavior  than  FIFO.  The  goal  of  FIFO  is  to  let  every  source 
in  a  common  class  share  equally  in  the  jitter.  In  priority, 
one  class  acquires  jitter  of  higher  priority  classes,  which  con¬ 
sequently  get  much  lower  jitter.  In  one  direction  priority  is 
considered  a  sharing  mechanism,  but  in  the  other  it  is  an  iso¬ 
lation  mechanism,  i.e.  lower  priority  traffic  can  never  affect 
the  performance  of  higher  priority  one. 

Why  might  a  priority  algorithm  be  of  mutual  beneht? 
The  beneht  of  lower  jitter  is  obvious;  the  beneht  of  higher 
jitter  would  presumably  be  a  lower  cost  for  the  service.  A 
source  with  more  tolerance  for  jitter  (or  for  higher  overall 
delay)  might  be  very  happy  to  obtain  a  cheaper  service  in 
exchange  for  taking  the  jitter  of  some  other  sources. 

One  can  think  in  general  of  scheduling  algorithms  as  rep¬ 
resenting  methods  iov  jitter  shifting,  in  which  explicit  actions 
are  taken  to  transfer  the  jitter  among  hows  in  a  controlled 
and  characterized  way.  One  could  invent  a  wide  range  of 
scheduling  schemes  that  reorder  the  queue  in  specihc  ways, 
as  we  discuss  in  the  section  on  related  work.  They  should 
all  be  examined  from  two  perspectives.  First,  how  and  to 
what  extent  do  they  perform  isolation?  Second,  how  and  to 
what  extent  do  they  provide  sharing? 

6  Multi-Hop  Sharing 

One  of  the  problems  with  the  FIFO  algorithm  is  that  if 
we  generalize  our  gedanken  experiment  to  include  several 
links,  then  the  jitter  tends  to  increase  dramatically  with  the 
number  of  hops,  since  the  packet  has  a  separate  opportunity 
for  uncorrelated  queueing  delays  at  each  hop. 

In  fact,  it  is  not  clear  that  this  increase  in  jitter  need 
occur.  Going  through  more  hops  provides  more  opportuni¬ 
ties  for  sharing,  and  hence  more  opportunities  for  reducing 
jitter.  The  key  is  to  correlate  the  sharing  experience  which 
a  packet  has  at  the  successive  nodes  in  its  path.  We  call  this 
scheme  FlFO-f.  In  priciple,  FlFO-f  is  very  similar  to  the 
least  slack  scheduling  algorithms  for  manufacturing  systems 
discussed  in  Reference  [18]. 

In  FlFO-f,  we  try  to  induce  FlFO-style  sharing  (equal 
jitter  for  all  sources  in  the  aggregate  class)  across  all  the  hops 
along  the  path  to  minimize  jitter.  We  do  this  as  follows.  For 
each  hop,  we  measure  the  average  delay  seen  by  packets  in 
each  priority  class  at  that  switch.  We  then  compute  for 
each  packet  the  difference  between  its  particular  delay  and 
the  class  average.  We  add  (or  subtract)  this  difference  to 
a  held  in  the  header  of  the  packet,  which  thus  accumulates 
the  total  offset  for  this  packet  from  the  average  for  its  class. 
This  held  allows  each  switch  to  compute  when  the  packet 
should  have  arrived  if  it  were  indeed  given  average  service. 
The  switch  then  inserts  the  packet  in  the  queue  in  the  order 
as  if  it  arrived  at  this  expected  time. 

To  test  this  algorithm,  we  have  simulated  its  perfor¬ 
mance  on  a  network  as  shown  on  Figure  1.  This  network 


has  four  equivalent  IMbit/sec  inter-switch  links,  and  each 
link  is  shared  by  10  hows.  There  are,  in  total,  22  hows;  all 
of  them  have  the  same  statistical  generation  process  (de¬ 
scribed  in  the  Appendix)  but  they  travel  different  network 
paths.  12  traverse  only  one  inter-switch  link,  4  traverse  two 
inter-switch  links,  4  traverse  three  inter-switch  links,  and 
2  traverse  all  four  inter-switch  links.  Table  2  displays  the 
mean  and  99.9’th  percentile  queueing  delays  for  a  single  sam¬ 
ple  how  for  each  path  length  (the  data  from  the  other  hows 
are  similar).  We  compare  the  WFQ,  FIFO,  and  FlFO-f  al¬ 
gorithms  (where  we  have  used  equal  clock  rates  in  the  WFQ 
algorithm).  Note  that  the  mean  delays  are  comparable  in 
all  three  cases.  While  the  99.9’th  percentile  delays  increase 
with  path  length  for  all  three  algorithms,  the  rate  of  growth 
is  much  smaller  with  the  FlFO-f  algorithm. 

As  the  simulation  shows,  the  effect  of  FlFO-f,  as  com¬ 
pared  to  FIFO,  is  to  slightly  increase  the  mean  delay  and  jit¬ 
ter  of  flows  on  short  paths,  slightly  decrease  the  mean  delay 
and  signihcantly  decrease  the  jitter  of  flows  on  long  paths, 
which  means  that  the  overall  delay  bound  goes  down  and 
the  precision  of  estimation  goes  up  on  long  paths.  When  we 
compare  the  implementation  of  the  two  schemes,  they  differs 
in  one  important  way  -  the  queue  management  discipline  is 
no  longer  trivial  (add  the  packet  to  the  tail  of  the  queue  for 
the  class)  but  instead  requires  that  the  queue  be  ordered 
by  deadline,  where  the  deadline  is  explicitly  computed  by 
taking  the  actual  arrival  time,  adjusting  this  by  the  offset  in 
the  packet  header  to  hnd  the  expected  arrival  time,  and  then 
using  this  to  order  the  queue.  This  has  the  possibility  of  a 
more  expensive  processing  overhead,  but  we  believe  that  ef- 
hcient  coding  methods  can  implement  this  in  software  with 
the  same  performance  as  current  packet  switches  achieve. 

We  have  now  extended  our  predicted  service  class  to  mul¬ 
tiple  hops,  using  FlFO-f  as  an  explicit  means  to  minimize 
the  jitter  and  to  obtain  as  much  beneht  as  possible  from 
sharing.  Compare  this  service  to  the  guaranteed  service, 
where  the  service  is  specihed  by  the  worst-case  bounds  and 
the  focus  is  on  scheduling  algorithms  that  provide  isolation 
between  the  various  hows.  In  our  gedanken  experiment  for 
predicted  service,  we  assume  that  (1)  adequate  isolation  is 
being  provided  by  the  enforcement  of  trafhc  hlters  before  or 
at  the  entrance  to  the  network,  and  (2)  the  overall  network 
conditions  are  not  changing  rapidly.  Here,  the  challenge  is  to 
share  the  link  effectively  in  a  way  that  minimizes  the  play¬ 
back  point.  As  we  have  seen,  FIFO  is  a  effective  sharing 
mechanism.  The  modihcation  of  FlFO-f  merely  extends  the 
concept  of  sharing  from  sharing  between  hows  at  a  single 
hop  to  sharing  between  hops. 

7  Unified  Scheduling  Algorithm 

In  the  previous  three  sections  we  have  presented  scheduling 
algorithms  that  each  handle  a  single  kind  of  service  com¬ 
mitment.  In  this  section  we  combine  these  algorithms  into  a 
unihed  scheduling  algorithm  that  handles  guaranteed,  pre¬ 
dicted,  and  datagram  service. 

Consider  a  set  of  real-time  hows,  some  requesting  guaran¬ 
teed  service  and  some  requesting  predicted  service,  and  also 
a  set  of  datagram  sources.  We  hrst  describe  the  scheduling 
algorithm  as  implemented  at  each  switch  and  then  discuss 
how  this  hts  into  our  overall  service  architecture. 

The  scheduling  algorithm  at  a  single  switch  is  quite  straight¬ 
forward.  The  basic  idea  is  that  we  must  isolate  the  trafhc 
of  guaranteed  service  class  from  that  of  predicted  service 


Figure  1:  Network  topology  used  for  data  in  Table  2. 
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4.74 
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2.54 

30.49 

4.73 

41.22 

7.97 

52.36 

10.33 

58.13 

FlFO-b 

2.71 

33.59 

4.69 

38.15 

7.76 

43.30 

10.11 

45.25 

Table  2:  The  mean  and  99.9’th  percentile  queueing  delays  (measured  in  the  unit  of  per  packet  transmission  time)  for  four 
sample  flows  of  different  path  lengths  under  the  WFQ,  FIFO,  and  F1FO+  scheduling  algorithms.  The  network  conhguration 
is  shown  in  Figure  1.  Each  inter-switch  link  is  83.5%  utilized. 


class,  as  well  as  isolate  guaranteed  flows  from  each  other. 
Therefore  we  use  the  time-stamp  based  WFQ  scheme  as  a 
framework  into  which  we  ht  the  other  scheduling  algorithms. 
Each  guaranteed  service  client  a  has  a  separate  WFQ  flow 
with  some  clock  rate  r“.  All  of  the  predicted  service  and 
datagram  service  traffic  is  assigned  to  a  pseudo  WFQ  flow, 
call  it  flow  0,  with,  at  each  link,  r°  =  fi  —  where  the 

sum  is  over  all  the  guaranteed  flows  passing  through  that 
link.  Inside  this  flow  0,  there  are  a  number  of  strict  pri¬ 
ority  classes,  and  within  each  priority  class  we  operate  the 
FIFO-I-  algorithm.  Once  we  have  assigned  each  predictive 
flow  (and  also  the  datagram  traffic)  to  a  priority  level  at  each 
switch,  the  scheduling  algorithm  is  completely  dehned.  We 
now  discuss  how  this  algorithm  hts  into  our  overall  service 
architecture. 

We  have  discussed  the  function  of  the  FlFO-|-  scheme 
above.  What  is  the  role  of  the  priority  classes?  Remember 
above  that  the  effect  of  priority  is  to  shift  the  jitter  of  higher 
priority  class  traffic  to  the  lower  priority  classes.  We  assign 
datagram  traffic  to  the  lowest  priority  class.  There  are  K 
other  priority  levels  above  the  datagram  priority  level. 

At  the  service  interface,  we  provide  K  widely  spaced  tar¬ 
get  delay  bounds  Dt  for  predicted  service  (at  a  particular 
switch).  The  priorities  are  used  to  separate  the  traffic  for 
the  different  K  classes.  These  bounds  Dt  are  not  estimates 
of  the  actual  delivered  delays.  Rather,  they  are  a  priori  up¬ 
per  bounds  and  the  network  tries,  through  admission  poli¬ 
cies,  to  keep  queueing  delays  at  each  switch  for  a  particular 
class  i  well  below  these  bounds  Df  We  mentioned  earlier 
that  adaptive  applications  have  limits  to  their  adaptability; 
these  bounds  Dt  are  indicative  of  such  limits.  A  predicted 
service  flow  is  assigned  a  priority  level  at  each  switch  (not 
necessarily  the  same  level  in  every  switch);  the  a  priori  delay 
bound  advertised  to  a  predicted  service  flow  is  the  sum  of  the 
appropriate  Dt  along  the  path.  The  delay  bound  advertised 
to  a  guaranteed  flow  is  the  Parekh-Gallager  bound. 

This  scheme  has  the  problem  that,  since  delay  is  additive, 
asking  for  a  particular  Dt  at  a  given  switch  does  not  directly 
mean  that  Dt  is  the  target  delay  bound  for  the  path  as  a 


whole.  Rather,  it  is  necessary  to  add  up  the  target  delays 
at  each  hop  to  hnd  the  target  upper  bound  for  the  path. 
We  expect  the  true  post  frj,cto  bounds  over  a  long  path  to  be 
signihcantly  lower  than  the  sum  of  the  bounds  Dt  at  each 
hop.  But  we  suggest  that,  since  this  is  an  adaptive  service, 
the  network  should  not  attempt  to  characterize  or  control 
the  service  to  great  precision,  and  thus  should  just  use  the 
sum  of  the  DFs  as  the  advertised  bound. 

Consider  in  more  detail  how  the  priority  scheme  works. 
If  the  highest  priority  class  has  a  momentary  need  for  extra 
bandwidth  due  to  a  burst  by  several  of  its  sources,  it  steals 
the  bandwidth  from  the  lower  classes.  The  next  class  thus 
sees  as  a  baseline  of  operation  the  aggregate  jitter  of  the 
higher  class.  This  gets  factored  together  with  the  aggregate 
burstiness  of  this  class  to  produce  the  total  jitter  for  the 
second  class.  This  cascades  down  to  the  datagram  traffic, 
which  gets  whatever  bandwidth  is  leftover  and  suffers  from 
the  accumulated  jitter.  As  we  argue  later,  the  datagram 
traffic  should  probably  be  given  an  average  rate  of  at  least 
10%  or  so,  both  to  insure  that  it  makes  some  progress  on  the 
average  and  to  provide  a  reasonable  pool  of  bandwidth  for 
the  higher  priority  traffic  to  borrow  from  during  momentary 
overloads. 

For  a  lower  priority  class,  what  source  of  jitter  will  dom¬ 
inate  its  observed  behavior:  its  intrinsic  aggregate  behavior 
or  the  jitter  shifted  from  the  higher  priority  classes?  If  the 
target  goals  for  jitter  are  widely  spaced  (and  for  the  pur¬ 
pose  of  rough  estimation  as  we  suggested  above  they  proba¬ 
bly  need  be  no  closer  than  an  order  of  magnitude)  then  the 
exported  jitter  from  the  higher  priority  class  should  be  an 
order  of  magnitude  less  than  the  intrinsic  behavior  of  the 
class,  and  the  classes  should  usually  operate  more  or  less 
independently.  Thus,  a  particular  class  is  isolated  from  the 
lower  priority  classes  by  the  priority  scheduling  algorithm 
and  is  in  effect  isolated  from  the  higher  priority  classes  be¬ 
cause  their  jitter  will  be  so  much  smaller  than  that  of  the 
particular  class. 

We  have  simulated  this  unihed  scheduling  algorithm  us¬ 
ing  the  same  simulation  conhguration  as  used  for  Table  2, 
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Table  3:  The  queueing  delay  measurement  of  several  sample  flows  in  simulating  the  unihed  scheduling  algorithm.  The  network 
conhguration  is  shown  in  Figure  1.  Each  inter-switch  link  is  utilized  over  99%. 


that  has  22  real-time  flows  with  identical  statistical  gener¬ 
ation  processes  but  which  traverse  different  network  paths. 

To  these  22  real-time  flows  we  also  added  2  datagram  TCP 
connections.  In  this  simulation,  5  of  the  real-time  flows  are 
guaranteed  service  clients;  3  of  these  have  a  clock  rate  equal 
to  their  peak  packet  generation  rate  (we  denote  such  flows 
by  Guaranteed-Peak)  and  the  other  2  have  a  clock  rate  equal 
to  their  average  packet  generation  rate  (we  denote  such  flows 
by  Guaranteed- Average).  See  the  Appendix  for  details  on 
the  statistical  generation  process  and  the  values  of  the  av¬ 
erage  and  peak  rates.  The  remaining  17  real-time  flows  are 
predicted  service  clients  served  by  two  priority  classes,  7 
flows  are  in  the  high  priority  class  (we  denote  such  flows  by 
Predicted-High)  and  the  other  10  flows  are  in  the  low  pri¬ 
ority  class  (we  denote  such  flows  by  Predicted-Low).  If  we 
look  at  the  traffic  traversing  each  link,  it  consists  of  one  data¬ 
gram  connection  and  10  real-time  flows:  2  Guaranteed-Peak, 

1  Guaranteed- Average,  3  Predicted-High,  and  4  Predicted- 
Low. 

Sample  results  of  the  simulation  are  presented  in  Table 
3  (where  P-G  bound  is  the  computed  Parekh-Gallager  delay 
bound).  We  see  that  all  of  the  guaranteed  service  flows  re¬ 
ceived  worst-case  delays  that  were  well  within  the  Parekh- 
Gallager  bounds.  Not  surprisingly,  the  Guaranteed-Peak 
flows  experienced  much  lower  delays  than  the  Guaranteed- 
Average  flows.  Similarly,  the  Predicted-High  flows  expe¬ 
rienced  lower  delays  than  the  Predicted-Low  flows.  For 
the  given  load  pattern  described  here,  the  delays  of  the 
Predicted-High  flows  were  lower  than  those  of  the  compara¬ 
ble  Guaranteed-Peak  flows,  and  the  delays  of  the  Predicted- 
Low  flows  were  lower  than  those  of  the  comparable  Guaranteed- 
Average  flows;  however,  this  relation  between  the  delays  of 
the  two  classes  is  an  artifact  of  the  particular  load  pattern 
and  is  not  necessarily  indicative  of  a  general  pattern. 

Not  shown  in  Table  3  is  the  performance  of  the  data¬ 
gram  traffic  which  experienced  a  very  low  drop  rate,  around 
0.1%.  The  overall  utilization  of  the  network  was  over  99%, 
with  83.5%  of  this  being  real-time  traffic.  It  is  important  to 
note  that  if  all  of  the  real-time  flows  had  requested  guaran¬ 
teed  service  with  a  clock  rate  equal  to  their  peak  rate,  the 
network  could  accomodate  many  fewer  real-time  flows  and 
the  utilization  due  to  real-time  traffic  would  be  reduced  to 
roughly  50%.  Thus,  providing  predicted  service  allows  the 
network  to  operate  with  a  higher  degree  of  real-time  traffic 
than  would  be  allowed  by  a  pure  guaranteed  service  offer¬ 
ing  the  same  delay  bounds.  These  results,  though  woefully 
incomplete,  are  qualitatively  consistent  with  our  analysis. 

We  are  currently  attempting  to  more  fully  validate  our 
design  through  simulation,  and  we  hope  to  report  on  our 
progress  in  a  subsequent  publication.  Note  that  much  of  the 
challenge  here  is  determining  how  to  evaluate  our  proposal. 


There  is  no  widely  accepted  set  of  benchmarks  for  real-time 
loads,  and  much  of  the  novelty  of  our  unihed  scheduling 
algorithm  is  our  provision  for  predicted  service,  which  can 
only  be  meaningfully  tested  in  a  dynamic  environment  with 
adaptive  clients. 

We  have  now  completed  the  hrst  parts  of  our  architec¬ 
ture.  We  have  described  a  model  for  the  low-level  packet 
forwarding  algorithm,  which  is  a  sharing  discipline  inside  an 
isolation  discipline,  and  we  have  provided  a  particular  ex¬ 
ample  of  such  a  scheme,  which  provides  both  of  our  service 
commitment  models,  guaranteed  and  predicted.  The  scheme 
provides  several  predicted  service  classes  with  different  delay 
bounds,  and  uses  a  particular  technique  (F1FO-I-)  to  provide 
low  jitter,  and  to  provide  a  jitter  bound  that  does  not  vary 
strongly  with  the  number  of  hops  in  the  paths. 

8  Service  Interface 

As  a  part  of  the  dehnition  of  the  unihed  scheduling  algo¬ 
rithm,  we  have  also  dehned  our  service  interface.  In  fact, 
there  are  two  forms  for  the  service  interface,  one  for  guar¬ 
anteed  service  and  another  for  predicted  service. 

For  guaranteed  service,  the  interface  is  simple:  the  source 
only  needs  to  specify  the  needed  clock  rate  r“,  then  the 
network  guarantees  this  rate.  The  source  uses  its  known 
value  for  6“(r“)  to  compute  its  worst  case  queueing  delay. 
If  the  delay  is  unsuitable,  it  must  request  a  higher  clock 
rate  r“.  The  network  does  no  conformance  check  on  any 
guaranteed  how,  because  the  how  does  not  make  any  trafhc 
characterization  commitment  to  the  network. 

For  predicted  service,  the  service  interface  must  charac¬ 
terize  both  the  trafhc  and  the  service.  For  the  characteriza¬ 
tion  of  the  trafhc  we  have  the  source  declare  the  parameters 
(r,  b)  of  the  token  bucket  trafhc  hlter  to  which  it  claims  its 
trafhc  will  conform.  Note  that  in  the  guaranteed  case  the 
client  did  not  need  to  inform  the  network  of  its  bucket  size  b. 
Separately,  the  source  must  request  the  needed  service.  This 
involves  selecting  a  suitable  delay  D  and  a  target  loss  rate 
L  the  application  can  tolerate.  The  network  will  use  these 
numbers  to  assign  the  source  to  an  aggregate  class  at  each 
switch  for  sharing  purposes.  Thus,  for  predicted  service,  the 
parameters  of  the  service  interface  are  the  hlter  rate  and  size 
(r,  6)  and  the  delay  and  loss  characteristics  {D,L). 

To  provide  predicted  service,  the  network  must  also  en¬ 
force  the  trafhc  commitments  made  by  the  clients.  Enforce¬ 
ment  is  carried  out  as  follows.  Each  predicted  service  how 
is  checked  at  the  edge  of  the  network  (i.e.,  the  hrst  switch 
the  trafhc  passes  through)  for  conformance  to  its  declared 
token  bucket  hlter;  nonconforming  packets  are  dropped  or 
tagged.  This  conformance  check  provides  the  necessary  iso¬ 
lation  that  is  a  mendatory  ticket  for  entering  a  shared  world. 


After  that  initial  check,  conformance  is  never  enforced  at 
later  switches;  this  is  becanse  any  later  violation  wonld  be 
dne  to  the  schednling  policies  and  load  dynamics  of  the  net¬ 
work  and  not  the  generation  behavior  of  the  sonrce. 

In  the  case  of  the  predicted  service,  specifying  the  token 
bncket  traffic  hlter  also  permits  the  network  to  estimate  if  it 
can  carry  the  new  sonrce  at  the  reqnested  rate  and  bnrstiness 
and  still  meet  the  service  targets  for  this,  and  all  of  the 
existing,  flows.  This  is  the  fnnction  of  the  last  part  of  the 
architectnre,  the  flow  admission  control  compntation. 

9  Admission  Control 

While  we  stated  earlier  that  we  wonld  not  address  the  ne¬ 
gotiation  process  for  the  establishment  of  service  commit¬ 
ments,  we  mnst  at  least  address  the  conditions  nnder  which 
a  network  accepts  or  denies  a  reqnest  for  service,  withont 
necessarily  specifying  the  exact  dynamics  of  that  exchange. 

There  are  two  criteria  to  apply  when  deciding  whether 
or  not  to  admit  additional  flows  into  the  network.  The  hrst 
admission  control  criterion  is  that  we  shonld  reserve  no  more 
than  90%  of  the  bandwidth  for  real-time  traffic,  letting  the 
datagram  traffic  have  access  to  at  least  10%  of  the  link;  while 
the  nnmerical  valne,  10%,  of  this  qnota  is  completely  ad  hoc 
and  experience  may  snggest  other  valnes  are  more  effective, 
we  do  believe  that  it  is  crncial  to  have  snch  a  qnota.  This 
qnota  ensnres  that  the  datagram  service  remains  operational 
at  all  times;  having  the  datagram  traffic  completely  shnt  ont 
for  arbitrarily  long  periods  of  time  will  likely  pnt  impossible 
demands  on  the  datagram  transport  layers.  In  addition,  the 
datagram  qnota  ensnres  that  there  is  enongh  spare  capacity 
to  accommodate  sizable  flnctnations  in  the  predicted  service 
traffic.  The  second  admission  control  criterion  is  that  we 
want  to  ensnre  that  the  addition  of  a  flow  does  not  increase 
any  of  the  predicted  delays  over  the  bonnds  Dt. 

We  now  give  an  example,  albeit  snperhcial,  of  how  one 
might  make  these  criteria  specihc.  Let  D  denote  the  mea- 
snred  post  facto  bonnd  on  ntilization  on  a  link  dne  to  real¬ 
time  traffic  (in  general,  the  hat  symbol  denotes  measnred 
qnantities),  let  dt  denote  the  measnred  maximal  delay  of 
the  traffic  in  class  i,  and  let  p  denote  the  link  speed.  In 
this  example  admission  control  criterion,  a  flow  promising 
to  conform  to  a  token  bncket  traffic  hlter  (r,  b)  can  be  ad¬ 
mitted  to  priority  level  i  if  (1)  r  +  h  <  .9fi,  and  (2)  b  < 
(Dj  —  dj){fi  —  h  —  r)  for  each  class  j  which  is  lower  than  or 
eqnal  in  priority  to  level  i.  For  the  pnrposes  of  the  compnta¬ 
tion  in  (2),  a  gnaranteed  service  commitment  is  considered 
to  be  higher  in  priority  than  all  levels  i.  The  hrst  condition 
gnarantees  that  there  is  at  least  10%  of  the  link  left  over 
for  datagram  trafhc.  The  second  condition  is  a  henristic  de¬ 
signed  to  ensnre  that  the  delays  will  not  violate  the  bonnds 
Dj  once  the  new  how  is  admitted  even  if  the  new  how  dis¬ 
plays  worst-case  behavior.  The  key  to  making  the  predic¬ 
tive  service  commitments  reliable  is  to  choose  appropriately 
conservative  measnres  for  and  dp,  these  shonld  not  jnst  be 
averages  bnt  consistently  conservative  estimates.  Knowing 
how  conservative  to  make  the  D  and  d^  may  involve  histor¬ 
ical  knowledge  of  the  size  of  hnctnations  in  network  trafhc 
and  delay  on  varions  links. 

This  example  is  overly  sketchy,  and  we  have  yet  to  simn- 
late  to  see  how  this  particnlar  implementation  of  admission 
control  wonld  fnnction  in  a  dynamic  network.  We  offer  it 
solely  as  an  illnstration  of  the  considerations  involved  in 
designing  an  admission  control  policy.  It  is  clear  that  the 


viability  of  onr  proposal  rests  on  onr  ability  to  formnlate 
an  admission  control  policy  which  will  make  the  predicted 
service  class  snfHciently  reliable;  specifying  and  validating 
snch  an  admission  control  policy  is  the  focns  of  onr  cnrrent 
work. 

We  have  the  following  additional  general  comments  on 
admission  control  policies.  It  is  not  clear  how  precise  snch 
an  algorithm  needs  to  be.  If  there  is  enongh  bandwidth  to 
meet  most  cnstomer  needs,  and  if  only  a  small  fraction  of 
trafhc  needs  the  most  demanding  of  the  predicted  service, 
then  a  rongh  estimate  may  be  adeqnate.  In  addition,  we 
are  offering  a  general  method  which  involves  measuring  the 
behavior  of  the  existing  real-time  trafhc,  rather  than  nsing 
the  trafhc  model  specihed  in  the  service  interface,  in  decid¬ 
ing  whether  to  admit  new  trafhc.  We  nse  the  worst-case 
trafhc  model  only  for  the  new  sonrce,  which  we  cannot  oth¬ 
erwise  characterize;  once  the  new  how  starts  rnnning,  we 
will  be  able  to  measnre  the  aggregate  trafhc  with  the  new 
how  and  base  farther  admission  decisions  on  the  most  recent 
measnrement.  This  approach  is  important  for  two  reasons. 
First,  since  the  sonrces  will  normally  operate  inside  their 
limits,  this  will  give  a  better  characterization  and  better 
link  ntilization.  Second,  it  matches  what  the  clients  them¬ 
selves  are  doing,  as  they  adapt  the  playback  point  to  the 
observed  network  trafhc.  Having  the  network  and  the  end¬ 
points  assess  the  trafhc  in  similar  ways  is  likely  to  better 
prodnce  reasonable  behavior. 

10  Other  Service  Qualities 

There  are  a  nnmber  of  other  service  featnres  that  have  been 
proposed  in  the  context  of  real-time  services.  Here  we  wish 
to  mention  them,  althongh  we  do  not  discnss  exactly  how 
to  snpport  them  in  the  context  of  onr  scheme. 

One  goal  is  that  if  overload  canses  some  of  the  packets 
from  a  sonrce  to  miss  their  deadline,  the  sonrce  shonld  be 
able  to  separate  its  packets  into  different  classes,  to  control 
which  packets  get  dropped.  This  idea  can  be  incorporated 
into  onr  scheme  by  creating  several  priority  classes  with  the 
same  target  Df  Packets  tagged  as  “less  important”  go  into 
the  lower  priority  class,  where  they  will  arrive  jnst  behind 
the  more  important  packets,  bnt  with  higher  priority  than 
the  classes  with  larger  Df  It  is  obvions  that  nse  of  priority 
here  can  create  a  range  of  policies. 

Another  proposed  service  is  that  packets  that  are  snfR- 
ciently  late  shonld  be  discarded  internally,  rather  than  be¬ 
ing  delivered,  since  in  delivering  them  the  network  may  nse 
bandwidth  that  conld  have  been  better  nsed  to  rednce  the 
delay  of  snbseqnent  packets.  The  offset  carried  in  the  packet 
in  the  FlFO-f  scheme  provides  precisely  the  needed  infor¬ 
mation:  if  a  packet  accnmnlates  a  very  large  jitter  offset,  it 
is  a  target  for  immediate  discarding.  This  idea  has  been  pro¬ 
posed  elsewhere  ([21])  bnt  we  observe  that  it  hts  natnrally 
into  the  FlFO-f  scheme. 

A  third  service  is  that  packets  shonld  be  bnffered  in  the 
network  if  they  might  otherwise  arrive  early  (before  the  play¬ 
back  point)  so  that  the  end-node  need  not  provide  the  bnffer- 
ing  or  estimate  the  cnrrent  delay.  We  are  not  convinced  that 
this  service  is  nsefnl  in  general.  With  cnrrent  memory  costs, 
bnffering  does  not  seem  expensive.  And  while  it  might  seem 
nice  for  the  network  to  relieve  the  destination  eqnipment 
from  the  need  to  estimate  the  delay,  it  cannot  eliminate  the 
need  for  the  end  to  adapt  to  a  change  in  the  delay.  The 
way  in  which  the  adaptation  is  done  is  application  specihc. 


and  must  drive  the  decision  as  to  when  to  change  the  actual 
playback  point.  Once  we  give  the  destination  enough  con¬ 
trol  to  perform  this  act,  it  seems  obvious  that  it  is  just  as 
simple  to  have  it  perform  the  delay  estimation  as  well. 

11  Related  Work 

There  has  been  a  flurry  of  recent  work  on  supporting  real¬ 
time  traffic  in  packet  networks.  We  cannot  hope  to  cover 
all  of  the  relevant  literature  in  this  brief  review;  instead,  we 
mention  only  a  few  representative  references. 

Though  the  WFQ  scheduling  algorithm  was  hrst  described 
in  Reference  [4],  Parekh  and  Gallager  were  the  hrst  to  ob¬ 
serve  that,  when  the  weights  are  chosen  appropriately  and 
the  traffic  sources  conform  to  token  bucket  hlters,  the  schedul¬ 
ing  algorithm  provides  guaranteed  service.  WFQ  is  similar 
in  spirit,  though  not  in  detail,  to  the  Delay-EDD  scheme  pro¬ 
posed  in  Reference  [7]  and  the  MARS  scheme  proposed  in 
References  [12,  13],  in  that  the  use  of  a  deadline  for  schedul¬ 
ing  in  Delay-EDD  and  MARS  are  analogous  to  the  virtual 
departure  time-stamps  used  in  WFQ.  However,  the  algo¬ 
rithms  used  to  compute  the  time-stamps/deadlines  are  quite 
different  in  the  three  algorithms.  Furthermore,  the  algo¬ 
rithms  use  rather  different  traffic  hlters  to  provide  bounds. 
Delay-EDD  uses  peak-rate  limits  (and  a  condition  on  the 
average  rate)  whereas  WFQ  uses  token  buckets  to  provide 
guaranteed  bounds.  MARS  has  no  explicit  traffic  hlters  and 
does  not  provide  guaranteed  bounds  (i.e.,  no  bounds  that  are 
independent  of  the  other  sources’  behavior);  rather,  MARS 
has  been  shown  through  simulation  with  a  particular  set  of 
statistical  sources  to  obey  certain  post  facto  bounds. 

WFQ,  Delay-EDD,  and  MARS  are  work-conserving  schedul¬ 
ing  algorithms,  in  that  the  link  is  never  left  idle  if  there  are 
packets  in  the  queue.  Several  non-work-conserving  schedul¬ 
ing  algorithms  have  been  proposed;  for  example,  Stop-and- 
Go  queueing  ([8,  9]),  Hierarchical  Round  Robin  ([16]),  and 
Jitter-EDD  ([22]).  All  of  these  bear  a  superhcial  similarity 
to  WFQ  in  that  packets  are  scheduled  according  to  some 
deadline  or  frame;  the  difference  is  that  the  packets  are  not 
allowed  to  leave  early.  These  algorithms  typically  deliver 
higher  average  delays  in  return  for  lower  jitter.  See  the  re¬ 
view  studies  [24,  27]  for  a  more  detailed  comparison  of  these 
schemes. 

The  Jitter-EDD  ([6,  22])  algorithm  make  use  of  a  de¬ 
lay  held  in  the  packet  header  to  inform  scheduling  deci¬ 
sions,  much  like  the  FlFO-|-  algorithm.  Also,  we  should  note 
that  the  MARS  scheduling  algorithm  uses  FIFO  scheduling 
within  a  class  of  aggregated  traffic  in  a  fashion  very  similar 
to  our  use  of  FIFO  within  each  predicted  service  class.  Fur¬ 
thermore,  Reference  [13]  makes  the  same  observation  that 
deadline  scheduling  in  a  homogeneous  class  leads  to  FIFO. 
Reference  [12]  also  observed  that  strict  priority  does  not  per¬ 
mit  as  many  sources  to  share  a  link  as  a  scheme  that  more 
actively  manages  jitter  shifting.  This  work  thus  represents 
an  example  of  queue  management  to  increase  link  loading, 
as  opposed  to  expanded  service  offerings. 

The  general  architecture  of  most  of  the  proposals  in  the 
literature,  with  Delay-EDD,  Jitter-EDD,  and  HRR  being 
examples,  focus  primarily  on  the  delivery  of  what  we  have 
called  guaranteed  service  to  real-time  traffic  (with  datagram 
traffic  comprising  the  rest  of  the  network  load).  Therefore 
the  designs  of  the  scheduling  algorithms  have  been  mainly 
focused  on  performing  isolation  among  flows,  with  MARS 
being  an  exception.  MARS  promotes  sharing  within  a  traffic 


class  by  FIFO,  and  among  different  classes  by  a  somewhat 
more  complex  scheme.  Due  to  lack  of  isolation,  however, 
MARS  does  not  provide  guaranteed  service.  The  MARS 
algorithm,  as  well  as  the  Statistical-EDD  ([7]),  attempt  to 
achieve  statistical  bounds,  but  these  bounds  are  still  com¬ 
puted  a  priori  (either  through  analytical  approximation  or 
through  the  simulation  of  a  particular  statistical  source). 
There  is  implicit  in  these  proposals  the  assumption  that  all 
real-time  network  clients  are,  in  our  taxonomy,  intolerant 
and  rigid.  While  the  worst-case  guaranteed  bounds  deliv¬ 
ered  by  these  mechanisms  are  appropriate  for  intolerant  and 
rigid  clients,  we  have  argued  that  there  will  likely  be  many 
real-time  clients  who  are  both  tolerant  and  adaptive. 

There  is  only  one  other  general  architecture  that  has,  as 
one  of  its  goals,  the  delivery  of  service  more  appropriate  for 
these  tolerant  and  adaptive  clients  (and  which  we  have  called 
predicted  service);  this  is  an  unpublished  scheme  due  to  Ja¬ 
cobson  and  Floyd  which  is  currently  being  simulated  and 
implemented.  Their  work  shares  with  our  predicted  service 
mechanism  the  philosophy  of  measuring  the  current  offered 
load  and  delivered  service  in  order  to  decide  if  new  service 
requests  should  be  granted.  Furthermore,  their  scheme  also 
involves  the  use  of  priorities  as  a  combine  sharing/isolation 
mechanism.  In  contrast  to  our  scheme,  their  scheme  uses 
enforcement  of  traffic  hlters  at  every  switch  as  an  additional 
form  of  isolation,  and  they  use  round-robin  instead  of  FIFO 
within  a  given  priority  level^.  Moreover,  there  is  no  provi¬ 
sion  for  guaranteed  service  in  their  mechanism. 

References  [10,  11]  present  admission  control  policies  in¬ 
volving  the  concept  of  equivalent  capacity  and  then  discuss 
traffic  hlters  (those  references  use  the  term  access  controls) 
related  to  those  admission  control  policies.  While  much  of 
the  work  is  analytical,  they  also  raise  the  possibility  of  using 
measurements  of  current  network  conditions  to  inform  the 
various  control  policies. 

12  Conclusion 

This  paper  contains  two  contributions:  an  architecture  and 
a  mechanism.  Our  architecture  is  perhaps  the  more  funda¬ 
mental  piece,  in  that  it  dehnes  the  problem  and  provides  a 
framework  for  comparing  various  mechanistic  alternatives. 
The  main  novelty  of  our  architecture,  which  arose  from  the 
observation  that  many  real-time  applications  can  be  made 
adaptive,  is  the  explicit  provision  for  two  different  kinds  of 
service  commitments.  The  guaranteed  class  of  service  is 
the  traditional  real-time  service  that  is  discussed  in  much  of 
the  literature.  Guaranteed  service  is  based  on  characteriza¬ 
tion  of  source  behavior  that  then  leads  to  static  worst-case 
bounds.  The  predicted  class  of  service,  which  is  designed  for 
adaptive  and  tolerant  real-time  clients,  is  explicitely  spelt 
out  the  hrst  time  in  published  literature.  It  replaces  traf- 
hc  characterization  with  measurement  in  network  admission 
control.  It  also  suggests  applications  replace  static  bounds 
with  adaptation  in  setting  the  play-back  point.  We  con¬ 
jecture  that  with  predictive  service  and  adaptive  clients  we 
can  achieve  both  higher  link  utilizations  and  superior  appli¬ 
cation  performance  (because  the  play-back  points  will  be  at 
the  de  facto  bounds,  not  the  a  priori  worst-case  bounds). 

^More  specifically,  they  combine  the  traffic  in  each  priority  level 
into  some  number  of  aggregate  groups,  and  do  FIFO  within  each 
group  (they  use  the  term  class,  but  in  this  paper  we  have  used  that 
term  with  a  different  meaning)  and  round-robin  among  the  groups. 
The  enforcement  of  traffic  filters  mentioned  above  is  applied  to  each 
group. 


Our  mechanism  is  both  an  existence  proof  that  our  ar¬ 
chitecture  can  be  realized,  and  perhaps  a  useful  artifact  in 
its  own  right.  The  mechanism’s  scheduling  algorithms  are 
built  around  the  recognition  that  the  principles  of  isolation 
and  sharing  axe  distinct  and  both  play  important  roles  when 
sources  are  bursty  and  bandwidth  is  limited. 

Isolation  is  fundamental  and  mandatory  for  any  traffic 
control  algorithm.  The  network  cannot  make  any  commit¬ 
ments  if  it  cannot  prevent  the  unexpected  behavior  of  one 
source  from  disrupting  others.  Sharing  is  important  but  not 
fundamental.  If  bandwidth  were  plentiful,  effective  behavior 
could  be  obtained  by  allocating  to  each  source  its  peak  rate; 
in  this  case  sharing  need  not  be  considered.  Note,  however, 
that  plentiful  bandwidth  does  not  eliminate  the  need  for  iso¬ 
lation,  as  we  still  need  to  ensure  that  each  source  does  not 
use  more  than  its  allocated  share  of  the  bandwidth.  Thus, 
careful  attention  to  sharing  arises  only  when  bandwidth  is 
limited.  In  environments  like  LANs,  it  may  be  more  cost- 
effective  to  over-provision  than  to  implement  intricate  shar¬ 
ing  algorithms.  One  should  therefore  embed  sharing  into 
the  architecture  only  with  caution. 

We  have  proposed  a  particular  scheme  for  sharing,  which 
seems  general  enough  that  we  propose  that  the  control  held 
(the  jitter  offset)  be  dehned  as  part  of  the  packet  header. 
But  we  note  that,  if  a  subnetwork  naturally  produces  very 
low  jitters,  it  could  just  ignore  the  held  and  operate  in  some 
simple  mode  like  FIFO.  When  a  subnetwork  has  these  very 
low  natural  jitters,  it  will  not  have  enough  queueing  to  re¬ 
move  most  of  the  accumulated  jitter  anyway,  and  the  error 
introduced  by  ignoring  the  held  should  be  minor.  Thus  our 
sharing  proposal  is  half  architecture  and  half  engineering 
optimization. 

We  conclude  with  one  last  observation:  pricing  must  be  a 
basic  part  of  any  complete  ISPN  architecture.  If  all  services 
are  free,  there  is  no  incentive  to  request  less  than  the  best 
service  the  network  can  provide,  which  will  not  produce  ef¬ 
fective  utilization  of  the  network’s  resources  (see  Reference 
[3]  for  a  discussion  of  these  issues).  The  sharing  model  in 
existing  datagram  networks  deals  with  overload  by  giving 
everyone  a  smaller  share;  the  equivalent  in  real-time  ser¬ 
vices  would  be  to  refuse  most  requests  most  of  the  time, 
which  would  be  very  unsatisfactory.  Prices  must  be  intro¬ 
duced  so  that  some  clients  will  request  higher  jitter  service 
because  of  its  lower  cost.  Therefore,  real-time  services  must 
be  deployed  along  with  some  means  for  accounting. 

It  is  exactly  this  price  discrimination  that  will  make  the 
predicted  service  class  viable.  Certainly  predicted  service  is 
less  reliable  than  guaranteed  service  and,  in  the  absence  of 
any  other  incentive,  network  clients  would  insist  on  guaran¬ 
teed  service  and  the  network  would  operate  at  low  levels  of 
utilization  and,  presumably,  high  prices.  However,  if  one  can 
ensure  that  the  reliability  of  predicted  service  is  sufficiently 
high  and  the  price  sufficiently  low,  many  network  clients  will 
prefer  to  use  the  predicted  service.  This  will  allow  ISPN’s 
to  operate  at  a  much  higher  level  of  utilization,  which  then 
allows  the  costs  to  be  spread  among  a  much  larger  user  pop¬ 
ulation. 
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14  Appendix 

In  our  simulations,  we  use  a  network  simulator  written  by 
one  of  us  (LZ)  and  used  in  a  number  of  previous  simula¬ 
tion  studies  ([3,  25,  27]).  The  sources  of  real-time  traffic  are 
two-state  Markov  processes.  In  each  burst  period,  a  geomet¬ 
rically  distributed  random  number  of  packets  are  generated 
at  some  peak  rate  P;  B  is  the  average  size  of  this  burst. 
After  the  burst  has  been  generated,  the  source  remains  idle 
for  some  exponentially  distributed  random  time  period;  I 
denotes  the  average  length  of  an  idle  period.  The  average 
rate  of  packet  generation  A  is  given  by 


In  all  the  simulations  mentioned  in  this  paper,  we  chose 
B  =  5  and  set  P  =  2 A  (implying  that  /  =  B/2A),  so  that 
the  peak  rate  was  double  the  average  rate.  Therefore,  the 
source  is  characterized  by  a  single  number  A.  Each  traffic 
source  was  then  subjected  to  an  (A,  50)  token  bucket  hlter 
(50  is  the  size  of  the  token  bucket)  and  any  nonconforming 
packets  were  dropped  at  the  source;  in  our  simulations  about 
2%  of  the  packets  were  dropped,  so  the  true  average  rate  was 
around  .98A. 

In  the  networks  we  simulate,  each  host  is  connected  to 
the  switch  by  an  inhnitely  fast  link.  All  inter-switch  links 
have  bandwidths  of  1  Mbit/sec,  all  switches  have  buffers 
which  can  hold  200  packets,  and  all  packets  are  1000  bits. 
All  the  queueing  delay  measurements  are  shown  in  units  of 
per  packet  transmission  time  (1msec)  and  all  data  is  taken 
from  simulations  covering  10  minutes  of  simulated  time. 

For  the  data  in  Table  1,  we  simulated  a  single-link  net¬ 
work;  there  were  10  flows  sharing  the  link,  and  the  value  of 
A  was  85  packets/sec  for  all  flows.  The  data  in  Table  2  is 
based  on  the  conhguration  in  Figure  1  which  has  5  switches, 
each  attached  to  a  host,  and  four  inter-switch  links.  There 
are  22  flows,  with  each  host  being  the  source  and/or  receiver 
of  several  flows,  and  all  of  the  network  traffic  travelling  in 
the  same  direction.  Each  inter-switch  link  was  shared  by  10 
flows.  There  were  12  flows  of  path  length  one,  4  flows  of 
path  length  two,  4  flows  of  path  length  three,  and  2  flows  of 
path  length  four.  The  value  of  A  was  85  packets/sec  for  all 
flows. 
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